megaman: Manifold Learning with Millions of points

نویسندگان

  • James McQueen
  • Marina Meila
  • Jacob VanderPlas
  • Zhongyue Zhang
چکیده

Manifold Learning (ML) is a class of algorithms seeking a low-dimensional non-linear representation of high-dimensional data. Thus ML algorithms are, at least in theory, most applicable to high-dimensional data and sample sizes to enable accurate estimation of the manifold. Despite this, most existing manifold learning implementations are not particularly scalable. Here we present a Python package that implements a variety of manifold learning algorithms in a modular and scalable fashion, using fast approximate neighbors searches and fast sparse eigendecompositions. The package incorporates theoretical advances in manifold learning, such as the unbiased Laplacian estimator introduced by Coifman and Lafon (2006) and the estimation of the embedding distortion by the Riemannian metric method introduced by Perraul-Joncas and Meila (2013). In benchmarks, even on a single-core desktop computer, our code embeds millions of data points in minutes, and takes just 200 minutes to embed the main sample of galaxy spectra from the Sloan Digital Sky Survey — consisting of 0.6 million samples in 3750-dimensions — a task which has not previously been possible.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Megaman: Scalable Manifold Learning in Python

Manifold Learning (ML) is a class of algorithms seeking a low-dimensional non-linear representation of high-dimensional data. Thus, ML algorithms are most applicable to highdimensional data and require large sample sizes to accurately estimate the manifold. Despite this, most existing manifold learning implementations are not particularly scalable. Here we present a Python package that implemen...

متن کامل

خوشه‌بندی داده‌ها بر پایه شناسایی کلید

Clustering has been one of the main building blocks in the fields of machine learning and computer vision. Given a pair-wise distance measure, it is challenging to find a proper way to identify a subset of representative exemplars and its associated cluster structures. Recent trend on big data analysis poses a more demanding requirement on new clustering algorithm to be both scalable and accura...

متن کامل

A Geometry Preserving Kernel over Riemannian Manifolds

Abstract- Kernel trick and projection to tangent spaces are two choices for linearizing the data points lying on Riemannian manifolds. These approaches are used to provide the prerequisites for applying standard machine learning methods on Riemannian manifolds. Classical kernels implicitly project data to high dimensional feature space without considering the intrinsic geometry of data points. ...

متن کامل

Video Subject Inpainting: A Posture-Based Method

Despite recent advances in video inpainting techniques, reconstructing large missing regions of a moving subject while its scale changes remains an elusive goal. In this paper, we have introduced a scale-change invariant method for large missing regions to tackle this problem. Using this framework, first the moving foreground is separated from the background and its scale is equalized. Then, a ...

متن کامل

آموزش منیفلد با استفاده از تشکیل گراف منیفلدِ مبتنی بر بازنمایی تنک

In this paper, a sparse representation based manifold learning method is proposed. The construction of the graph manifold in high dimensional space is the most important step of the manifold learning methods that is divided into local and gobal groups. The proposed graph manifold extracts local and global features, simultanstly. After construction the sparse representation based graph manifold,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1603.02763  شماره 

صفحات  -

تاریخ انتشار 2016